Reducing storage requirements for biological sequence comparison
نویسندگان
چکیده
MOTIVATION Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinformatics. A dominant method of such string matching is the 'seed-and-extend' approach, in which occurrences of short subsequences called 'seeds' are used to search for potentially longer matches in a large database of sequences. Each such potential match is then checked to see if it extends beyond the seed. To be effective, the seed-and-extend approach needs to catalogue seeds from virtually every substring in the database of search strings. Projects such as mammalian genome assemblies and large-scale protein matching, however, have such large sequence databases that the resulting list of seeds cannot be stored in RAM on a single computer. This significantly slows the matching process. RESULTS We present a simple and elegant method in which only a small fraction of seeds, called 'minimizers', needs to be stored. Using minimizers can speed up string-matching computations by a large factor while missing only a small fraction of the matches found using all seeds.
منابع مشابه
Image Encryption by Using Combination of DNA Sequence and Lattice Map
In recent years, the advancement of digital technology has led to an increase in data transmission on the Internet. Security of images is one of the biggest concern of many researchers. Therefore, numerous algorithms have been presented for image encryption. An efficient encryption algorithm should have high security and low search time along with high complexity.DNA encryption is one of the fa...
متن کاملEvaluating the Environmental Flow of Beshar River, Using Tennant's Hydrological Method Based on the Biological Requirements of Indicator Fishes
Environmental flow of the Beshar River was evaluated using Tennant's hydrological method, the comparison of the depth and speed of the water in the transverse sections of the river, and the needs of indicator fish species in different stages of the life cycle. Field samplings were conducted in October and January 2021 to monitor the status of fishes in the Beshar River and to collect cross-sect...
متن کاملSequence Assembly Validation by Restriction Digest Fingerprint Comparison
DNA sequence analysis depends on the accurate assembly of fragment reads for the determination of a consensus sequence. Genomic sequences frequently contain repeat elements that may confound the fragment assembly process, and errors in fragment assembly, and errors in fragment assembly may seriously impact the biological interpretation of the sequence data. Validating the fidelity of sequence a...
متن کاملطراحی کاواک 100 مگاهرتز برای حلقه انبارش چشمه نور ایران
Iranian Light Source Facility (ILSF) RF system was conceptually designed based on ILSF requirements for a 3 GeV storage ring and 400 mA beam current at 500 MHz RF frequency. Considering the fact that cavity construction is simpler at 100 MHz and advantages of reducing frequency provided an alternative of 100MHz RF system to be explored for ILSF. After a thorough study on the effect of reducing...
متن کاملتاثیر مدت زمان انبارداری سردخانهای روی برخی خواص فیزیولوژیکی دو رقم سیب
Different biological and environmental factors, harvesting way, transport, storage time and the method of storage are affected agricultural product properties. As a result of climacteric nature and continuous physiological life of apple fruit, the biochemical changes are very economic. This study was designed for obtain deep knowledge from this changes and effects of them in apple quality. In f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 18 شماره
صفحات -
تاریخ انتشار 2004